Descriptive Visualizations - Python

Machine Learning Model to discuss the spread of Corona Virus & Prediction

  • Exploratory Data Analysis of COVID19 dataset

  • Analyzing present condition of COVID19

  • COVID19 Outbreak - Data Visualization

  • COVID19 Outbreak - Prediction using Machine Learning

EDA of COVID19 dataset

  • Step 1: Importing Libraries
In [1]:
#Install Packages
#import pip

#package_names=['sklearn', 'fbprophet'] #packages to install
#pip.main(['install'] + package_names + ['--upgrade'])
In [2]:
#importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

EDA of COVID19 dataset

  • Step 2: Importing Data
In [3]:
#Reading Data using panda libraries 
data=pd.read_csv('C:\\Jupyter\\covid_19_data.csv' ,parse_dates=['Last Update'])
#Renaming the column names 
data.rename(columns={'ObservationDate':'Date', 'Country/Region':'Country'}, inplace=True)
#calling the head of the data frame 
data.head()
Out[3]:
SNo Date Province/State Country Last Update Confirmed Deaths Recovered
0 1 01/22/2020 Anhui Mainland China 2020-01-22 17:00:00 1.0 0.0 0.0
1 2 01/22/2020 Beijing Mainland China 2020-01-22 17:00:00 14.0 0.0 0.0
2 3 01/22/2020 Chongqing Mainland China 2020-01-22 17:00:00 6.0 0.0 0.0
3 4 01/22/2020 Fujian Mainland China 2020-01-22 17:00:00 1.0 0.0 0.0
4 5 01/22/2020 Gansu Mainland China 2020-01-22 17:00:00 0.0 0.0 0.0

EDA of COVID19 dataset

  • Step 3: Data understanding
In [4]:
#Check for number of rows & columns
print("No. of Rows & Columns:" + str(data.shape))
#checking if null is present in dataset
print(data.isnull().sum())
No. of Rows & Columns:(11930, 8)
SNo                  0
Date                 0
Province/State    5663
Country              0
Last Update          0
Confirmed            0
Deaths               0
Recovered            0
dtype: int64

Analyzing present condition of COVID19

  • Step 1: Up-todate COVID19 status country-wise
In [5]:
#Grouping of data based on COuntry and Date 
df = data.groupby(["Date", "Country"])[['Date', 'Country', 'Confirmed', 'Deaths', 'Recovered']].sum().reset_index()
#Sort the data based on Confirmed Cases 
sort_By_Confirmed_cases=df.sort_values('Confirmed',ascending=False)
#Removing the duplicate country 
sort_By_Confirmed_cases=sort_By_Confirmed_cases.drop_duplicates('Country')
sort_By_Confirmed_cases.head()
Out[5]:
Date Country Confirmed Deaths Recovered
6234 04/04/2020 US 308850.0 8407.0 14652.0
6218 04/04/2020 Spain 126168.0 11947.0 34219.0
6145 04/04/2020 Italy 124632.0 15362.0 20996.0
6125 04/04/2020 Germany 96092.0 1444.0 26400.0
6121 04/04/2020 France 90848.0 7574.0 15572.0

Analyzing present condition of COVID19

  • Step 2: No. of Confirmed, Deaths, Recovered & Active cases around the world calculation
In [6]:
#making a different set of confirmed, Death & Recovered cases for the world 
Confirmed_cases_for_world=sort_By_Confirmed_cases['Confirmed'].sum()
Deaths_cases_for_world=sort_By_Confirmed_cases['Deaths'].sum()
Recovered_cases_for_world=sort_By_Confirmed_cases['Recovered'].sum()
#Active cases by subracting deaths and recovered from confirmed cases
Active_cases=Confirmed_cases_for_world-Deaths_cases_for_world-Recovered_cases_for_world
print("Confirmed cases around the world: " + str(Confirmed_cases_for_world)) 
print("Deaths around the world: " + str(Deaths_cases_for_world))
print("Recovered cases around the world: " + str(Recovered_cases_for_world)) 
print("Active cases around the world: " + str(Active_cases))
Confirmed cases around the world: 1198380.0
Deaths around the world: 64613.0
Recovered cases around the world: 246456.0
Active cases around the world: 887311.0
In [7]:
#Finding the death percentage 
Deaths_rate=(Deaths_cases_for_world*100)/Confirmed_cases_for_world
#finding the recovered rate percentage 
Recovered_rate=(Recovered_cases_for_world*100)/Confirmed_cases_for_world
#Cases only for China province
China=sort_By_Confirmed_cases[sort_By_Confirmed_cases['Country']=='Mainland China']
Recovered_rate_for_china=(int(China['Recovered'].values)*100)/int(China['Confirmed'].values)

Analyzing present condition of COVID19

  • Step 3: World-wide statistics of COVID-19
In [8]:
#mapping all the data in a table 
Set1={'Total Number of Confirmed cases in the World':Confirmed_cases_for_world,'Total Number of Death cases in the World':Deaths_cases_for_world,'Total Number of Recovered cases in the World':Recovered_cases_for_world,'Total Number of Active Cases':Active_cases,
      'Rate of the Recovered Cases in the world':Recovered_rate,'Rate of the Death Cases in the world %':Deaths_rate,'Rate of the Recovered Cases in the China  %':Recovered_rate_for_china}
Set1=pd.DataFrame.from_dict(Set1, orient='index' ,columns=['Total'])
print("Data is till 04/04/2020") 
Set1.style.background_gradient(cmap='Reds')
Data is till 04/04/2020
Out[8]:
Total
Total Number of Confirmed cases in the World 1.19838e+06
Total Number of Death cases in the World 64613
Total Number of Recovered cases in the World 246456
Total Number of Active Cases 887311
Rate of the Recovered Cases in the world 20.5658
Rate of the Death Cases in the world % 5.3917
Rate of the Recovered Cases in the China % 94.0285
In [9]:
#Finding the Recovered Rate 
Recovered_rate=(sort_By_Confirmed_cases['Recovered']*100)/sort_By_Confirmed_cases['Confirmed']
#Finding the Death Rate 
Deaths_rate=(sort_By_Confirmed_cases['Deaths']*100)/sort_By_Confirmed_cases['Confirmed']
#Finding the COnfirmed cases Rate 
cases_rate=(sort_By_Confirmed_cases.Confirmed*100)/Confirmed_cases_for_world


sort_By_Confirmed_cases['Active Cases']=sort_By_Confirmed_cases['Confirmed']-sort_By_Confirmed_cases['Deaths']-sort_By_Confirmed_cases['Recovered']
sort_By_Confirmed_cases['% of Recovered cases']=pd.DataFrame(Recovered_rate)
sort_By_Confirmed_cases['% of Death cases']=pd.DataFrame(Deaths_rate)
sort_By_Confirmed_cases['% of Total cases']=pd.DataFrame(cases_rate)


print("Sorted By Confirmed Cases")
#sort_By_Confirmed_cases.style.background_gradient(cmap='Reds')

sort_By_Confirmed_cases.style.background_gradient(cmap="Blues", subset=['Confirmed', 'Active Cases','Total Cases Rate %',])\
            .background_gradient(cmap="Greens", subset=['Recovered','% of Recovered cases'])\
            .background_gradient(cmap="Reds", subset=['Deaths','% of Death cases'])
Sorted By Confirmed Cases
Out[9]:
Date Country Confirmed Deaths Recovered Active Cases % of Recovered cases % of Death cases % of Total cases
6234 04/04/2020 US 308850 8407 14652 285791 4.74405 2.72203 25.7723
6218 04/04/2020 Spain 126168 11947 34219 80002 27.1218 9.46912 10.5282
6145 04/04/2020 Italy 124632 15362 20996 88274 16.8464 12.3259 10.4
6125 04/04/2020 Germany 96092 1444 26400 68248 27.4737 1.50273 8.01849
6121 04/04/2020 France 90848 7574 15572 67702 17.1407 8.337 7.5809
6166 04/04/2020 Mainland China 81638 3326 76763 1549 94.0285 4.07408 6.81236
6141 04/04/2020 Iran 55743 3452 19736 32555 35.4053 6.19271 4.65153
6233 04/04/2020 UK 42477 4320 215 37942 0.506156 10.1702 3.54454
6232 04/04/2020 Turkey 23934 501 786 22647 3.28403 2.09326 1.9972
6223 04/04/2020 Switzerland 20505 666 6415 13424 31.2851 3.24799 1.71106
6078 04/04/2020 Belgium 18431 1283 3247 13901 17.6171 6.9611 1.53799
6183 04/04/2020 Netherlands 16727 1656 262 14809 1.56633 9.90016 1.3958
6094 04/04/2020 Canada 12978 218 2577 10183 19.8567 1.67977 1.08296
6071 04/04/2020 Austria 11781 186 2507 9088 21.28 1.57881 0.983077
6198 04/04/2020 Portugal 10524 266 75 10183 0.712657 2.52756 0.878186
6085 04/04/2020 Brazil 10360 445 127 9788 1.22587 4.29537 0.8645
6217 04/04/2020 South Korea 10156 177 6325 3654 62.2785 1.74281 0.847477
6144 04/04/2020 Israel 7851 44 427 7380 5.4388 0.560438 0.655134
6222 04/04/2020 Sweden 6443 373 205 5865 3.18175 5.78923 0.537642
6189 04/04/2020 Norway 5550 62 32 5456 0.576577 1.11712 0.463125
6070 04/04/2020 Australia 5550 30 701 4819 12.6306 0.540541 0.463125
6201 04/04/2020 Russia 4731 43 333 4355 7.03868 0.908899 0.394783
6143 04/04/2020 Ireland 4604 137 25 4442 0.543006 2.97567 0.384185
6105 04/04/2020 Czech Republic 4472 59 78 4335 1.74419 1.31932 0.37317
6106 04/04/2020 Denmark 4269 161 1379 2729 32.3026 3.77138 0.356231
6097 04/04/2020 Chile 4161 27 528 3606 12.6893 0.648882 0.347219
6197 04/04/2020 Poland 3627 79 116 3432 3.19824 2.17811 0.302659
6200 04/04/2020 Romania 3613 146 329 3138 9.10601 4.04096 0.30149
6168 04/04/2020 Malaysia 3483 57 915 2511 26.2705 1.63652 0.290642
6111 04/04/2020 Ecuador 3465 172 100 3193 2.886 4.96392 0.28914
6148 04/04/2020 Japan 3139 77 514 2548 16.3746 2.45301 0.261937
6196 04/04/2020 Philippines 3094 144 57 2893 1.84228 4.65417 0.258182
6139 04/04/2020 India 3082 86 229 2767 7.43024 2.7904 0.257181
6191 04/04/2020 Pakistan 2818 41 131 2646 4.64869 1.45493 0.235151
6162 04/04/2020 Luxembourg 2729 31 500 2198 18.3217 1.13595 0.227724
6207 04/04/2020 Saudi Arabia 2179 29 420 1730 19.2749 1.33089 0.181829
6140 04/04/2020 Indonesia 2092 191 150 1751 7.17017 9.13002 0.174569
6227 04/04/2020 Thailand 2067 20 674 1373 32.6076 0.967586 0.172483
6120 04/04/2020 Finland 1882 25 300 1557 15.9405 1.32837 0.157045
6195 04/04/2020 Peru 1746 73 914 759 52.3482 4.18099 0.145697
6174 04/04/2020 Mexico 1688 60 633 995 37.5 3.5545 0.140857
6127 04/04/2020 Greece 1673 68 78 1527 4.66228 4.06455 0.139605
6192 04/04/2020 Panama 1673 41 13 1619 0.777047 2.45069 0.139605
6209 04/04/2020 Serbia 1624 44 0 1580 0 2.70936 0.135516
6216 04/04/2020 South Africa 1585 9 95 1481 5.99369 0.567823 0.132262
6237 04/04/2020 United Arab Emirates 1505 10 125 1370 8.30565 0.664452 0.125586
5927 04/03/2020 Dominican Republic 1488 68 16 1404 1.07527 4.56989 0.124168
6068 04/04/2020 Argentina 1451 43 279 1129 19.2281 2.96347 0.12108
6138 04/04/2020 Iceland 1417 4 396 1017 27.9464 0.282287 0.118243
6098 04/04/2020 Colombia 1406 32 85 1289 6.04552 2.27596 0.117325
6199 04/04/2020 Qatar 1325 3 109 1213 8.22642 0.226415 0.110566
6064 04/04/2020 Algeria 1251 130 90 1031 7.19424 10.3917 0.104391
6236 04/04/2020 Ukraine 1225 32 25 1168 2.04082 2.61224 0.102221
6212 04/04/2020 Singapore 1189 6 297 886 24.979 0.504626 0.0992173
6102 04/04/2020 Croatia 1126 12 119 995 10.5684 1.06572 0.0939602
6112 04/04/2020 Egypt 1070 71 241 758 22.5234 6.63551 0.0892872
6116 04/04/2020 Estonia 1039 13 59 967 5.67854 1.2512 0.0867004
6214 04/04/2020 Slovenia 977 22 79 876 8.08598 2.25179 0.0815267
6184 04/04/2020 New Zealand 950 1 127 822 13.3684 0.105263 0.0792737
6179 04/04/2020 Morocco 919 59 66 794 7.18172 6.42002 0.0766869
6142 04/04/2020 Iraq 878 56 259 563 29.4989 6.37813 0.0732656
6136 04/04/2020 Hong Kong 862 4 173 685 20.0696 0.464037 0.0719304
6161 04/04/2020 Lithuania 771 11 7 753 0.907912 1.42672 0.0643369
6069 04/04/2020 Armenia 770 7 43 720 5.58442 0.909091 0.0642534
6175 04/04/2020 Moldova 752 12 29 711 3.85638 1.59574 0.0627514
4837 03/28/2020 Diamond Princess 712 10 597 105 83.8483 1.40449 0.0594135
3346 03/19/2020 Others 712 7 325 380 45.6461 0.983146 0.0594135
6074 04/04/2020 Bahrain 688 4 423 261 61.4826 0.581395 0.0574108
6137 04/04/2020 Hungary 678 32 58 588 8.55457 4.71976 0.0565764
6083 04/04/2020 Bosnia and Herzegovina 624 21 30 573 4.80769 3.36538 0.0520703
6093 04/04/2020 Cameroon 555 9 17 529 3.06306 1.62162 0.0463125
6231 04/04/2020 Tunisia 553 18 5 530 0.904159 3.25497 0.0461456
6150 04/04/2020 Kazakhstan 531 5 36 490 6.77966 0.94162 0.0443098
6072 04/04/2020 Azerbaijan 521 5 32 484 6.14203 0.959693 0.0434754
6157 04/04/2020 Lebanon 520 17 54 449 10.3846 3.26923 0.0433919
6156 04/04/2020 Latvia 509 1 1 507 0.196464 0.196464 0.042474
6087 04/04/2020 Bulgaria 503 17 34 452 6.75944 3.37972 0.0419733
6188 04/04/2020 North Macedonia 483 17 20 446 4.14079 3.51967 0.0403044
6153 04/04/2020 Kuwait 479 1 93 385 19.4154 0.208768 0.0399706
6213 04/04/2020 Slovakia 471 1 10 460 2.12314 0.212314 0.0393031
6065 04/04/2020 Andorra 466 17 21 428 4.50644 3.64807 0.0388858
6077 04/04/2020 Belarus 440 5 53 382 12.0455 1.13636 0.0367162
6101 04/04/2020 Costa Rica 435 2 13 420 2.98851 0.45977 0.036299
6104 04/04/2020 Cyprus 426 11 33 382 7.74648 2.58216 0.035548
6238 04/04/2020 Uruguay 400 5 93 302 23.25 1.25 0.0333784
6225 04/04/2020 Taiwan 355 5 50 300 14.0845 1.40845 0.0296233
6063 04/04/2020 Albania 333 20 99 214 29.7297 6.00601 0.0277875
6149 04/04/2020 Jordan 323 5 74 244 22.9102 1.54799 0.0269531
6088 04/04/2020 Burkina Faso 318 16 66 236 20.7547 5.03145 0.0265358
6062 04/04/2020 Afghanistan 299 7 10 282 3.34448 2.34114 0.0249503
6103 04/04/2020 Cuba 288 6 15 267 5.20833 2.08333 0.0240324
6190 04/04/2020 Oman 277 2 61 214 22.0217 0.722022 0.0231145
6239 04/04/2020 Uzbekistan 266 2 25 239 9.3985 0.75188 0.0221966
6135 04/04/2020 Honduras 264 15 3 246 1.13636 5.68182 0.0220297
6206 04/04/2020 San Marino 259 32 27 200 10.4247 12.3552 0.0216125
6146 04/04/2020 Ivory Coast 245 1 25 219 10.2041 0.408163 0.0204443
6241 04/04/2020 Vietnam 240 0 90 150 37.5 0 0.020027
6208 04/04/2020 Senegal 219 2 72 145 32.8767 0.913242 0.0182747
6242 04/04/2020 West Bank and Gaza 217 1 21 195 9.67742 0.460829 0.0181078
6187 04/04/2020 Nigeria 214 4 25 185 11.6822 1.86916 0.0178574
6171 04/04/2020 Malta 213 0 2 211 0.938967 0 0.017774
5943 04/03/2020 Ghana 205 5 31 169 15.122 2.43902 0.0171064
6178 04/04/2020 Montenegro 201 2 1 198 0.497512 0.995025 0.0167726
6173 04/04/2020 Mauritius 196 7 7 182 3.57143 3.57143 0.0163554
6219 04/04/2020 Sri Lanka 166 5 27 134 16.2651 3.01205 0.013852
6124 04/04/2020 Georgia 162 1 36 125 22.2222 0.617284 0.0135182
6240 04/04/2020 Venezuela 155 7 52 96 33.5484 4.51613 0.0129341
6100 04/04/2020 Congo (Kinshasa) 154 18 3 133 1.94805 11.6883 0.0128507
6154 04/04/2020 Kyrgyzstan 144 1 9 134 6.25 0.694444 0.0120162
6186 04/04/2020 Niger 144 8 0 136 0 5.55556 0.0120162
6082 04/04/2020 Bolivia 139 10 1 128 0.719424 7.19424 0.011599
6086 04/04/2020 Brunei 135 1 66 68 48.8889 0.740741 0.0112652
6152 04/04/2020 Kosovo 135 1 16 118 11.8519 0.740741 0.0112652
6151 04/04/2020 Kenya 126 4 4 118 3.1746 3.1746 0.0105142
6092 04/04/2020 Cambodia 114 0 50 64 43.8596 0 0.00951284
6130 04/04/2020 Guinea 111 0 5 106 4.5045 0 0.0092625
6230 04/04/2020 Trinidad and Tobago 103 6 1 96 0.970874 5.82524 0.00859494
6202 04/04/2020 Rwanda 102 0 0 102 0 0 0.00851149
6194 04/04/2020 Paraguay 96 3 12 81 12.5 3.125 0.00801081
6160 04/04/2020 Liechtenstein 77 1 0 76 0 1.2987 0.00642534
6165 04/04/2020 Madagascar 70 0 0 70 0 0 0.00584122
6075 04/04/2020 Bangladesh 70 8 30 32 42.8571 11.4286 0.00584122
6176 04/04/2020 Monaco 66 1 3 62 4.54545 1.51515 0.00550744
6129 04/04/2020 Guatemala 61 2 15 44 24.5902 3.27869 0.00509021
6113 04/04/2020 El Salvador 56 3 2 51 3.57143 5.35714 0.00467298
3638 03/21/2020 Guadeloupe 53 0 0 53 0 0 0.00442264
6147 04/04/2020 Jamaica 53 3 7 43 13.2075 5.66038 0.00442264
6076 04/04/2020 Barbados 52 0 0 52 0 0 0.00433919
6108 04/04/2020 Djibouti 50 0 8 42 16 0 0.0041723
6052 04/03/2020 Uganda 48 0 0 48 0 0 0.00400541
3711 03/21/2020 Reunion 45 0 0 45 0 0 0.00375507
5981 04/03/2020 Macau 43 0 10 33 23.2558 0 0.00358818
6229 04/04/2020 Togo 41 3 17 21 41.4634 7.31707 0.00342129
6170 04/04/2020 Mali 41 3 1 37 2.43902 7.31707 0.00342129
6243 04/04/2020 Zambia 39 1 2 36 5.12821 2.5641 0.00325439
6118 04/04/2020 Ethiopia 38 0 4 34 10.5263 0 0.00317095
3500 03/20/2020 Martinique 32 1 0 31 0 3.125 0.00267027
6115 04/04/2020 Eritrea 29 0 0 29 0 0 0.00241993
6073 04/04/2020 Bahamas 28 4 0 24 0 14.2857 0.00233649
2086 03/10/2020 occupied Palestinian territory 25 0 0 25 0 0 0.00208615
6132 04/04/2020 Guyana 23 4 0 19 0 17.3913 0.00191926
1721 03/07/2020 Palestine 22 0 0 22 0 0 0.00183581
5733 04/02/2020 Congo (Brazzaville) 22 2 2 18 9.09091 9.09091 0.00183581
5756 04/02/2020 Gabon 21 1 0 20 0 4.7619 0.00175237
6089 04/04/2020 Burma 21 1 0 20 0 4.7619 0.00175237
1834 03/08/2020 Republic of Ireland 21 0 0 21 0 0 0.00175237
5860 04/02/2020 Tanzania 20 1 2 17 10 5 0.00166892
6133 04/04/2020 Haiti 20 0 1 19 5 0 0.00166892
5803 04/02/2020 Maldives 19 0 13 6 68.4211 0 0.00158547
3630 03/21/2020 French Guiana 18 0 6 12 33.3333 0 0.00150203
6131 04/04/2020 Guinea-Bissau 18 0 0 18 0 0 0.00150203
6159 04/04/2020 Libya 18 1 0 17 0 5.55556 0.00150203
6041 04/03/2020 Syria 16 2 0 14 0 12.5 0.00133514
5897 04/03/2020 Benin 16 0 2 14 12.5 0 0.00133514
5931 04/03/2020 Equatorial Guinea 16 0 1 15 6.25 0 0.00133514
5884 04/03/2020 Antigua and Barbuda 15 0 0 15 0 0 0.00125169
5628 04/01/2020 Mongolia 14 0 2 12 14.2857 0 0.00116824
6109 04/04/2020 Dominica 14 0 0 14 0 0 0.00116824
5632 04/01/2020 Namibia 14 0 2 12 14.2857 0 0.00116824
6204 04/04/2020 Saint Lucia 14 0 1 13 7.14286 0 0.00116824
5945 04/03/2020 Grenada 12 0 0 12 0 0 0.00100135
6119 04/04/2020 Fiji 12 0 0 12 0 0 0.00100135
6037 04/03/2020 Sudan 10 2 2 6 20 20 0.00083446
6180 04/04/2020 Mozambique 10 0 1 9 10 0 0.00083446
6066 04/04/2020 Angola 10 2 2 6 20 20 0.00083446
6038 04/03/2020 Suriname 10 1 0 9 0 10 0.00083446
6027 04/03/2020 Seychelles 10 0 0 10 0 0 0.00083446
6155 04/04/2020 Laos 10 0 0 10 0 0 0.00083446
6158 04/04/2020 Liberia 10 1 3 6 30 10 0.00083446
6163 04/04/2020 MS Zaandam 9 2 0 7 0 22.2222 0.000751014
6117 04/04/2020 Eswatini 9 0 0 9 0 0 0.000751014
6182 04/04/2020 Nepal 9 0 1 8 11.1111 0 0.000751014
6096 04/04/2020 Chad 9 0 0 9 0 0 0.000751014
5837 04/02/2020 Saint Kitts and Nevis 9 0 0 9 0 0 0.000751014
6061 04/03/2020 Zimbabwe 9 1 0 8 0 11.1111 0.000751014
6095 04/04/2020 Central African Republic 8 0 0 8 0 0 0.000667568
5951 04/03/2020 Holy See 7 0 0 7 0 0 0.000584122
3682 03/21/2020 Mayotte 7 0 0 7 0 0 0.000584122
6032 04/03/2020 Somalia 7 0 1 6 14.2857 0 0.000584122
6091 04/04/2020 Cabo Verde 7 1 0 6 0 14.2857 0.000584122
6205 04/04/2020 Saint Vincent and the Grenadines 7 0 1 6 14.2857 0 0.000584122
5441 03/31/2020 Mauritania 6 1 2 3 33.3333 16.6667 0.000500676
6002 04/03/2020 Nicaragua 5 1 0 4 0 20 0.00041723
5898 04/03/2020 Bhutan 5 0 2 3 40 0 0.00041723
5757 04/02/2020 Gambia 4 1 2 1 50 25 0.000333784
3585 03/21/2020 Bahamas, The 4 0 0 4 0 0 0.000333784
5354 03/31/2020 Botswana 4 1 0 3 0 25 0.000333784
5896 04/03/2020 Belize 4 0 0 4 0 0 0.000333784
3070 03/18/2020 Aruba 4 0 0 4 0 0 0.000333784
6211 04/04/2020 Sierra Leone 4 0 0 4 0 0 0.000333784
6167 04/04/2020 Malawi 4 0 0 4 0 0 0.000333784
2859 03/16/2020 Puerto Rico 3 0 0 3 0 0 0.000250338
1627 03/06/2020 Saint Barthelemy 3 0 0 3 0 0 0.000250338
2957 03/17/2020 Guam 3 0 0 3 0 0 0.000250338
5724 04/02/2020 Burundi 3 0 0 3 0 0 0.000250338
2520 03/14/2020 Jersey 2 0 0 2 0 0 0.000166892
1893 03/09/2020 Faroe Islands 2 0 0 2 0 0 0.000166892
1972 03/10/2020 ('St. Martin',) 2 0 0 2 0 0 0.000166892
1959 03/09/2020 St. Martin 2 0 0 2 0 0 0.000166892
6193 04/04/2020 Papua New Guinea 1 0 0 1 0 0 8.3446e-05
3048 03/17/2020 The Bahamas 1 0 0 1 0 0 8.3446e-05
3049 03/17/2020 The Gambia 1 0 0 1 0 0 8.3446e-05
2800 03/16/2020 Guernsey 1 0 0 1 0 0 8.3446e-05
5862 04/02/2020 Timor-Leste 1 0 0 1 0 0 8.3446e-05
2861 03/16/2020 Republic of the Congo 1 0 0 1 0 0 8.3446e-05
2955 03/17/2020 Greenland 1 0 0 1 0 0 8.3446e-05
2629 03/15/2020 Curacao 1 0 0 1 0 0 8.3446e-05
2475 03/14/2020 Cayman Islands 1 0 0 1 0 0 8.3446e-05
1859 03/08/2020 Vatican City 1 0 0 1 0 0 8.3446e-05
1788 03/08/2020 Gibraltar 1 0 0 1 0 0 8.3446e-05
1069 02/28/2020 North Ireland 1 0 0 1 0 0 8.3446e-05
1030 02/28/2020 Azerbaijan 1 0 0 1 0 0 8.3446e-05
3282 03/19/2020 Gambia, The 1 0 0 1 0 0 8.3446e-05
3603 03/21/2020 Cape Verde 1 0 0 1 0 0 8.3446e-05
1995 03/10/2020 Channel Islands 1 0 0 1 0 0 8.3446e-05
3618 03/21/2020 East Timor 1 0 0 1 0 0 8.3446e-05

Analyzing present condition of COVID19

  • Step 3: Displaying country-wise statistics of COVID-19 after calculating Recovered_rate, Deaths_rate, cases_rate, Active Cases, % of Recovered cases, % of Death cases & % of Total cases
In [10]:
sort_By_Confirmed_cases.head()
Out[10]:
Date Country Confirmed Deaths Recovered Active Cases % of Recovered cases % of Death cases % of Total cases
6234 04/04/2020 US 308850.0 8407.0 14652.0 285791.0 4.744051 2.722033 25.772293
6218 04/04/2020 Spain 126168.0 11947.0 34219.0 80002.0 27.121774 9.469121 10.528213
6145 04/04/2020 Italy 124632.0 15362.0 20996.0 88274.0 16.846396 12.325887 10.400040
6125 04/04/2020 Germany 96092.0 1444.0 26400.0 68248.0 27.473671 1.502727 8.018492
6121 04/04/2020 France 90848.0 7574.0 15572.0 67702.0 17.140719 8.337002 7.580901
In [11]:
#finding the total number of confirmed cases in China till date 
sort_By_Confirmed_cases[sort_By_Confirmed_cases['Country']=='Mainland China']
Out[11]:
Date Country Confirmed Deaths Recovered Active Cases % of Recovered cases % of Death cases % of Total cases
6166 04/04/2020 Mainland China 81638.0 3326.0 76763.0 1549.0 94.028516 4.074083 6.812363
In [12]:
#Sort the data based on COnfirmed Cases and group by country 
df=sort_By_Confirmed_cases.groupby(['Country']).sum().sort_values(by ='Confirmed',ascending=False)
df.reset_index(level=0, inplace=True)
df.head()
Out[12]:
Country Confirmed Deaths Recovered Active Cases % of Recovered cases % of Death cases % of Total cases
0 US 308850.0 8407.0 14652.0 285791.0 4.744051 2.722033 25.772293
1 Spain 126168.0 11947.0 34219.0 80002.0 27.121774 9.469121 10.528213
2 Italy 124632.0 15362.0 20996.0 88274.0 16.846396 12.325887 10.400040
3 Germany 96092.0 1444.0 26400.0 68248.0 27.473671 1.502727 8.018492
4 France 90848.0 7574.0 15572.0 67702.0 17.140719 8.337002 7.580901

Analyzing present condition of COVID19¶

  • Step 4: Sorting countries based on Confirmed cases
In [13]:
#Sort the data based on confirmed cases based on COuntry
Country_wise_Confirmed = sort_By_Confirmed_cases[sort_By_Confirmed_cases['Confirmed']>0][['Country', 'Confirmed']]

Country_wise_Confirmed.sort_values('Confirmed', ascending=False).reset_index(drop=True).style.background_gradient(cmap='Greens')
Out[13]:
Country Confirmed
0 US 308850
1 Spain 126168
2 Italy 124632
3 Germany 96092
4 France 90848
5 Mainland China 81638
6 Iran 55743
7 UK 42477
8 Turkey 23934
9 Switzerland 20505
10 Belgium 18431
11 Netherlands 16727
12 Canada 12978
13 Austria 11781
14 Portugal 10524
15 Brazil 10360
16 South Korea 10156
17 Israel 7851
18 Sweden 6443
19 Norway 5550
20 Australia 5550
21 Russia 4731
22 Ireland 4604
23 Czech Republic 4472
24 Denmark 4269
25 Chile 4161
26 Poland 3627
27 Romania 3613
28 Malaysia 3483
29 Ecuador 3465
30 Japan 3139
31 Philippines 3094
32 India 3082
33 Pakistan 2818
34 Luxembourg 2729
35 Saudi Arabia 2179
36 Indonesia 2092
37 Thailand 2067
38 Finland 1882
39 Peru 1746
40 Mexico 1688
41 Panama 1673
42 Greece 1673
43 Serbia 1624
44 South Africa 1585
45 United Arab Emirates 1505
46 Dominican Republic 1488
47 Argentina 1451
48 Iceland 1417
49 Colombia 1406
50 Qatar 1325
51 Algeria 1251
52 Ukraine 1225
53 Singapore 1189
54 Croatia 1126
55 Egypt 1070
56 Estonia 1039
57 Slovenia 977
58 New Zealand 950
59 Morocco 919
60 Iraq 878
61 Hong Kong 862
62 Lithuania 771
63 Armenia 770
64 Moldova 752
65 Diamond Princess 712
66 Others 712
67 Bahrain 688
68 Hungary 678
69 Bosnia and Herzegovina 624
70 Cameroon 555
71 Tunisia 553
72 Kazakhstan 531
73 Azerbaijan 521
74 Lebanon 520
75 Latvia 509
76 Bulgaria 503
77 North Macedonia 483
78 Kuwait 479
79 Slovakia 471
80 Andorra 466
81 Belarus 440
82 Costa Rica 435
83 Cyprus 426
84 Uruguay 400
85 Taiwan 355
86 Albania 333
87 Jordan 323
88 Burkina Faso 318
89 Afghanistan 299
90 Cuba 288
91 Oman 277
92 Uzbekistan 266
93 Honduras 264
94 San Marino 259
95 Ivory Coast 245
96 Vietnam 240
97 Senegal 219
98 West Bank and Gaza 217
99 Nigeria 214
100 Malta 213
101 Ghana 205
102 Montenegro 201
103 Mauritius 196
104 Sri Lanka 166
105 Georgia 162
106 Venezuela 155
107 Congo (Kinshasa) 154
108 Niger 144
109 Kyrgyzstan 144
110 Bolivia 139
111 Brunei 135
112 Kosovo 135
113 Kenya 126
114 Cambodia 114
115 Guinea 111
116 Trinidad and Tobago 103
117 Rwanda 102
118 Paraguay 96
119 Liechtenstein 77
120 Madagascar 70
121 Bangladesh 70
122 Monaco 66
123 Guatemala 61
124 El Salvador 56
125 Guadeloupe 53
126 Jamaica 53
127 Barbados 52
128 Djibouti 50
129 Uganda 48
130 Reunion 45
131 Macau 43
132 Togo 41
133 Mali 41
134 Zambia 39
135 Ethiopia 38
136 Martinique 32
137 Eritrea 29
138 Bahamas 28
139 occupied Palestinian territory 25
140 Guyana 23
141 Palestine 22
142 Congo (Brazzaville) 22
143 Gabon 21
144 Burma 21
145 Republic of Ireland 21
146 Tanzania 20
147 Haiti 20
148 Maldives 19
149 Guinea-Bissau 18
150 Libya 18
151 French Guiana 18
152 Syria 16
153 Benin 16
154 Equatorial Guinea 16
155 Antigua and Barbuda 15
156 Mongolia 14
157 Dominica 14
158 Namibia 14
159 Saint Lucia 14
160 Grenada 12
161 Fiji 12
162 Liberia 10
163 Seychelles 10
164 Laos 10
165 Mozambique 10
166 Suriname 10
167 Sudan 10
168 Angola 10
169 MS Zaandam 9
170 Eswatini 9
171 Nepal 9
172 Chad 9
173 Saint Kitts and Nevis 9
174 Zimbabwe 9
175 Central African Republic 8
176 Somalia 7
177 Saint Vincent and the Grenadines 7
178 Cabo Verde 7
179 Mayotte 7
180 Holy See 7
181 Mauritania 6
182 Nicaragua 5
183 Bhutan 5
184 Gambia 4
185 Bahamas, The 4
186 Botswana 4
187 Belize 4
188 Aruba 4
189 Sierra Leone 4
190 Malawi 4
191 Guam 3
192 Burundi 3
193 Puerto Rico 3
194 Saint Barthelemy 3
195 Faroe Islands 2
196 ('St. Martin',) 2
197 St. Martin 2
198 Jersey 2
199 Cayman Islands 1
200 Channel Islands 1
201 Cape Verde 1
202 Gambia, The 1
203 Azerbaijan 1
204 North Ireland 1
205 Gibraltar 1
206 Vatican City 1
207 Greenland 1
208 Curacao 1
209 Republic of the Congo 1
210 Timor-Leste 1
211 Guernsey 1
212 The Gambia 1
213 The Bahamas 1
214 Papua New Guinea 1
215 East Timor 1

Covid Outbreak - Data Visualization

  • Visualization 1 - Bar plot showing Top20 countries which has highest number of Confirmed cases due COVID19 pandemic outbreak.
In [14]:
#Bar plot for confirmed cases for top 20 countries 
fig = px.bar(Country_wise_Confirmed.sort_values('Confirmed', ascending=False).head(20), 
             x="Country", y="Confirmed", color='Confirmed', 
             height=800, width=1000,
             title='Number of Confirmed Cases in World in top 20 countries')
#fig.update_traces(text=Country_wise_Confirmed['Confirmed'], textposition='outside')
fig.update_layout(uniformtext_minsize=10, uniformtext_mode='hide')
fig.show()
In [15]:
#Sort data based on recovered cases in the world
Country_wise_Recovered = sort_By_Confirmed_cases[sort_By_Confirmed_cases['Recovered']>0][['Country', 'Recovered']]
#Country_wise_Recovered['Recovered / 100 Cases'] = round((sort_By_Confirmed_cases['Recovered']/sort_By_Confirmed_cases['Recovered'])*100, 2)
Country_wise_Recovered.sort_values('Recovered', ascending=False).reset_index(drop=True).style.background_gradient(cmap='Greens')
Out[15]:
Country Recovered
0 Mainland China 76763
1 Spain 34219
2 Germany 26400
3 Italy 20996
4 Iran 19736
5 France 15572
6 US 14652
7 Switzerland 6415
8 South Korea 6325
9 Belgium 3247
10 Canada 2577
11 Austria 2507
12 Denmark 1379
13 Malaysia 915
14 Peru 914
15 Turkey 786
16 Australia 701
17 Thailand 674
18 Mexico 633
19 Diamond Princess 597
20 Chile 528
21 Japan 514
22 Luxembourg 500
23 Israel 427
24 Bahrain 423
25 Saudi Arabia 420
26 Iceland 396
27 Russia 333
28 Romania 329
29 Others 325
30 Finland 300
31 Singapore 297
32 Argentina 279
33 Netherlands 262
34 Iraq 259
35 Egypt 241
36 India 229
37 UK 215
38 Sweden 205
39 Hong Kong 173
40 Indonesia 150
41 Pakistan 131
42 New Zealand 127
43 Brazil 127
44 United Arab Emirates 125
45 Croatia 119
46 Poland 116
47 Qatar 109
48 Ecuador 100
49 Albania 99
50 South Africa 95
51 Uruguay 93
52 Kuwait 93
53 Algeria 90
54 Vietnam 90
55 Colombia 85
56 Slovenia 79
57 Greece 78
58 Czech Republic 78
59 Portugal 75
60 Jordan 74
61 Senegal 72
62 Burkina Faso 66
63 Morocco 66
64 Brunei 66
65 Oman 61
66 Estonia 59
67 Hungary 58
68 Philippines 57
69 Lebanon 54
70 Belarus 53
71 Venezuela 52
72 Cambodia 50
73 Taiwan 50
74 Armenia 43
75 Georgia 36
76 Kazakhstan 36
77 Bulgaria 34
78 Cyprus 33
79 Azerbaijan 32
80 Norway 32
81 Ghana 31
82 Bangladesh 30
83 Bosnia and Herzegovina 30
84 Moldova 29
85 Sri Lanka 27
86 San Marino 27
87 Uzbekistan 25
88 Nigeria 25
89 Ireland 25
90 Ivory Coast 25
91 Ukraine 25
92 West Bank and Gaza 21
93 Andorra 21
94 North Macedonia 20
95 Togo 17
96 Cameroon 17
97 Dominican Republic 16
98 Kosovo 16
99 Cuba 15
100 Guatemala 15
101 Costa Rica 13
102 Panama 13
103 Maldives 13
104 Paraguay 12
105 Macau 10
106 Afghanistan 10
107 Slovakia 10
108 Kyrgyzstan 9
109 Djibouti 8
110 Jamaica 7
111 Mauritius 7
112 Lithuania 7
113 French Guiana 6
114 Tunisia 5
115 Guinea 5
116 Kenya 4
117 Ethiopia 4
118 Monaco 3
119 Congo (Kinshasa) 3
120 Liberia 3
121 Honduras 3
122 Angola 2
123 Sudan 2
124 Bhutan 2
125 Namibia 2
126 Mongolia 2
127 Benin 2
128 Mauritania 2
129 Gambia 2
130 Congo (Brazzaville) 2
131 Zambia 2
132 Malta 2
133 El Salvador 2
134 Tanzania 2
135 Haiti 1
136 Nepal 1
137 Latvia 1
138 Saint Vincent and the Grenadines 1
139 Somalia 1
140 Mozambique 1
141 Montenegro 1
142 Trinidad and Tobago 1
143 Saint Lucia 1
144 Mali 1
145 Equatorial Guinea 1
146 Bolivia 1

Covid Outbreak - Data Visualization

  • Visualization 2 - Bar plot showing Top20 countries which has highest number of Recovered cases from COVID19 pandemic outbreak.
In [16]:
#Bar plot for Recovered cases for top 20 countries 
fig = px.bar(Country_wise_Recovered.sort_values('Recovered', ascending=False).head(20), 
             x="Country", y="Recovered", color='Recovered', 
             height=800, width=1000,
             title='Number of Recovered  in World in top 20 countries')
#fig.update_traces(text=Country_wise_Recovered['Recovered'], textposition='outside')
fig.update_layout(uniformtext_minsize=10, uniformtext_mode='hide')
fig.show()
In [17]:
#sorting the data based on deaths in country 
Country_wise_deaths = sort_By_Confirmed_cases[sort_By_Confirmed_cases['Deaths']>0][['Country', 'Deaths']]

Country_wise_deaths.sort_values('Deaths', ascending=False).reset_index(drop=True).style.background_gradient(cmap='Reds')
Out[17]:
Country Deaths
0 Italy 15362
1 Spain 11947
2 US 8407
3 France 7574
4 UK 4320
5 Iran 3452
6 Mainland China 3326
7 Netherlands 1656
8 Germany 1444
9 Belgium 1283
10 Switzerland 666
11 Turkey 501
12 Brazil 445
13 Sweden 373
14 Portugal 266
15 Canada 218
16 Indonesia 191
17 Austria 186
18 South Korea 177
19 Ecuador 172
20 Denmark 161
21 Romania 146
22 Philippines 144
23 Ireland 137
24 Algeria 130
25 India 86
26 Poland 79
27 Japan 77
28 Peru 73
29 Egypt 71
30 Dominican Republic 68
31 Greece 68
32 Norway 62
33 Mexico 60
34 Czech Republic 59
35 Morocco 59
36 Malaysia 57
37 Iraq 56
38 Serbia 44
39 Israel 44
40 Russia 43
41 Argentina 43
42 Panama 41
43 Pakistan 41
44 Ukraine 32
45 Hungary 32
46 Colombia 32
47 San Marino 32
48 Luxembourg 31
49 Australia 30
50 Saudi Arabia 29
51 Chile 27
52 Finland 25
53 Slovenia 22
54 Bosnia and Herzegovina 21
55 Albania 20
56 Thailand 20
57 Congo (Kinshasa) 18
58 Tunisia 18
59 North Macedonia 17
60 Bulgaria 17
61 Lebanon 17
62 Andorra 17
63 Burkina Faso 16
64 Honduras 15
65 Estonia 13
66 Croatia 12
67 Moldova 12
68 Cyprus 11
69 Lithuania 11
70 United Arab Emirates 10
71 Diamond Princess 10
72 Bolivia 10
73 South Africa 9
74 Cameroon 9
75 Niger 8
76 Bangladesh 8
77 Afghanistan 7
78 Venezuela 7
79 Mauritius 7
80 Armenia 7
81 Others 7
82 Trinidad and Tobago 6
83 Singapore 6
84 Cuba 6
85 Belarus 5
86 Uruguay 5
87 Taiwan 5
88 Sri Lanka 5
89 Jordan 5
90 Azerbaijan 5
91 Ghana 5
92 Kazakhstan 5
93 Nigeria 4
94 Hong Kong 4
95 Bahrain 4
96 Kenya 4
97 Iceland 4
98 Bahamas 4
99 Guyana 4
100 Paraguay 3
101 Jamaica 3
102 Togo 3
103 Mali 3
104 Qatar 3
105 El Salvador 3
106 MS Zaandam 2
107 Sudan 2
108 Syria 2
109 Guatemala 2
110 Congo (Brazzaville) 2
111 Angola 2
112 Montenegro 2
113 Senegal 2
114 Uzbekistan 2
115 Oman 2
116 Costa Rica 2
117 Suriname 1
118 Liberia 1
119 Zimbabwe 1
120 Cabo Verde 1
121 Burma 1
122 Gambia 1
123 Libya 1
124 Tanzania 1
125 Nicaragua 1
126 Mauritania 1
127 New Zealand 1
128 Gabon 1
129 Martinique 1
130 Zambia 1
131 Monaco 1
132 Liechtenstein 1
133 Kosovo 1
134 Brunei 1
135 Kyrgyzstan 1
136 Georgia 1
137 West Bank and Gaza 1
138 Ivory Coast 1
139 Slovakia 1
140 Kuwait 1
141 Latvia 1
142 Botswana 1

Covid Outbreak - Data Visualization

  • Visualization 3 - Bar plot showing Top20 countries which has highest number of Death due to COVID19 pandemic outbreak.
In [18]:
#Bar graph to plot country wise deaths for top 20 countries 
fig = px.bar(Country_wise_deaths.sort_values('Deaths', ascending=False).head(20), 
             x="Country", y="Deaths", color='Deaths', 
             height=600, width=1000,
             title='Number of Deaths  in World in top 20 countries')
#fig.update_traces(text=Country_wise_deaths['Deaths'], textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 4 - Scatter plot of Confirmed cases vs Deaths depicts higher the number of Confirmed cases higher the number of deaths.
In [19]:
#Scatter plot based on deaths vs confirmed cases 
fig = px.scatter(sort_By_Confirmed_cases.sort_values('Deaths', ascending=False).iloc[:20, :], 
                 x='Confirmed', y='Deaths', color='Country', size='Confirmed', height=800,
                 text='Country', log_x=True, log_y=True, title='Deaths cases vs Confirmed cases ')
fig.update_traces(textposition='top center')
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 5 - Ratio of Death cases with respect to recovered cases
In [20]:
#Dgroup data based on date  to find number of cases per day
Daily_cases = data.groupby(["Date"])['Confirmed','Deaths', 'Recovered'].sum().reset_index()
#Sort value based on date 
sorted_By_Confirmed_cases_per_day=Daily_cases.sort_values('Date',ascending=False)
print(sorted_By_Confirmed_cases_per_day)
          Date  Confirmed   Deaths  Recovered
73  04/04/2020  1197405.0  64606.0   246152.0
72  04/03/2020  1095917.0  58787.0   225796.0
71  04/02/2020  1013157.0  52983.0   210263.0
70  04/01/2020   932605.0  46809.0   193177.0
69  03/31/2020   857487.0  42107.0   178034.0
..         ...        ...      ...        ...
4   01/26/2020     2118.0     56.0       52.0
3   01/25/2020     1438.0     42.0       39.0
2   01/24/2020      941.0     26.0       36.0
1   01/23/2020      653.0     18.0       30.0
0   01/22/2020      555.0     17.0       28.0

[74 rows x 4 columns]
In [21]:
###Ratio of Death cases with respect to recovered cases function 
def Ratio_of_Death_Recovered(data_frame):
    figure, axes1 = plt.subplots(1,1,figsize=(20,7))
    axes1.plot(data_frame['Deaths']/data_frame['Confirmed'], 'r', label='Death Ratio')
    axes1.legend(loc='upper left')
    axes1.set_xticklabels(sorted_By_Confirmed_cases_per_day.index, rotation=75)
    axes1.set_ylabel('Death Ratio', fontsize=15, color='r')

    axes2=axes1.twinx()
   # ax2._get_lines.prop_cycler = ax1._get_lines.prop_cycler
    axes2.plot(data_frame['Recovered']/data_frame['Confirmed'], 'g', label='Recovered Ratio')
    axes2.legend(loc='upper center')
    axes2.set_ylabel('Recovered Ratio', fontsize=15, color='g')
In [22]:
###Ratio of Death cases with respect to recovered cases function is called
Ratio_of_Death_Recovered(sorted_By_Confirmed_cases_per_day)
In [23]:
#assigning variable to axes
x=Daily_cases.index

y=Daily_cases.Confirmed
y1=Daily_cases.Deaths
y2=Daily_cases.Recovered

Covid Outbreak - Data Visualization

  • Visualization 6 - Scatter plot of number of cases on daily basis(from 01-01-2020 to 31-03-2020), shows rapid growth of Confirmed Cases especially in the month of March.
In [24]:
#calling seaborn library and set the style to whitegrid
sns.set(style="whitegrid")

# Initialize the matplotlib figure
f, ax = plt.subplots(figsize=(12,10 ))
#plot the scatter plot 
plt.scatter(x,y,color='Green' , label='Confirmed Cases')
plt.scatter(x,y1,color='red' ,label="Deaths Cases")
plt.scatter(x,y2,color='yellow',label="Recovered Cases")
plt.title("Increasing Coronavirus cases in the world per day .")
ax.legend(ncol=2, loc='upper left', frameon=True)
plt.show()

Covid Outbreak - Data Visualization

  • Visualization 7 - Table showing COVID-19 statistics for each day from January till March.
In [25]:
#Coronavirus cases statistics per day 
sorted_By_Confirmed_cases_per_day.style.background_gradient(cmap='Reds')
Out[25]:
Date Confirmed Deaths Recovered
73 04/04/2020 1.19740e+06 64606 246152
72 04/03/2020 1.09592e+06 58787 225796
71 04/02/2020 1.01316e+06 52983 210263
70 04/01/2020 932605 46809 193177
69 03/31/2020 857487 42107 178034
68 03/30/2020 782365 37582 164566
67 03/29/2020 720117 33925 149082
66 03/28/2020 660706 30652 139415
65 03/27/2020 593291 27198 130915
64 03/26/2020 529591 23970 122150
63 03/25/2020 467594 21181 113770
62 03/24/2020 417966 18615 107705
61 03/23/2020 378287 16497 100958
60 03/22/2020 337020 14623 97243
59 03/21/2020 304528 12973 91676
58 03/20/2020 272167 11299 87403
57 03/19/2020 242713 9867 84962
56 03/18/2020 214915 8733 83313
55 03/17/2020 197168 7905 80840
54 03/16/2020 181546 7126 78088
53 03/15/2020 167447 6440 76034
52 03/14/2020 156099 5819 72624
51 03/13/2020 145193 5404 70251
50 03/12/2020 128343 4720 68324
49 03/11/2020 125865 4615 67003
48 03/10/2020 118582 4262 64404
47 03/09/2020 113582 3996 62512
46 03/08/2020 109835 3803 60695
45 03/07/2020 105836 3558 58359
44 03/06/2020 101800 3460 55866
43 03/05/2020 97886 3348 53797
42 03/04/2020 95124 3254 51171
41 03/03/2020 92844 3160 48229
40 03/02/2020 90309 3085 45602
39 03/01/2020 88371 2996 42716
38 02/29/2020 86013 2941 39782
37 02/28/2020 84124 2872 36711
36 02/27/2020 82756 2814 33277
35 02/26/2020 81397 2770 30384
34 02/25/2020 80415 2708 27905
33 02/24/2020 79570 2629 25227
32 02/23/2020 78985 2469 23394
31 02/22/2020 78599 2458 22886
30 02/21/2020 76843 2251 18890
29 02/20/2020 76199 2247 18177
28 02/19/2020 75641 2122 16121
27 02/18/2020 75138 2007 14352
26 02/17/2020 73260 1868 12583
25 02/16/2020 71226 1770 10865
24 02/15/2020 69032 1666 9395
23 02/14/2020 66887 1523 8058
22 02/13/2020 60370 1371 6295
21 02/12/2020 45222 1118 5150
20 02/11/2020 44803 1113 4683
19 02/10/2020 42763 1013 3946
18 02/09/2020 40151 906 3244
17 02/08/2020 37121 806 2616
16 02/07/2020 34392 719 2011
15 02/06/2020 30818 634 1487
14 02/05/2020 27636 564 1124
13 02/04/2020 23892 492 852
12 02/03/2020 19881 426 623
11 02/02/2020 16787 362 472
10 02/01/2020 12038 259 284
9 01/31/2020 9925 213 222
8 01/30/2020 8235 171 143
7 01/29/2020 6165 133 126
6 01/28/2020 5578 131 107
5 01/27/2020 2927 82 61
4 01/26/2020 2118 56 52
3 01/25/2020 1438 42 39
2 01/24/2020 941 26 36
1 01/23/2020 653 18 30
0 01/22/2020 555 17 28

Covid Outbreak - Data Visualization

  • Visualization 8 - Bar plot which shows increasing Number of Confirmed cases in top 20 countries on a daily basis
In [26]:
#Bar plot for Increasing Number of Confirmed cases in World in top 20 countries on a daily basis 
fig = px.bar(sorted_By_Confirmed_cases_per_day.head(20), 
             x="Date", y="Confirmed", color='Confirmed', 
             height=600, width=1000,
             
             title='Increasing Number of Confirmed cases in World on daily basis')
fig.update_traces(text=sorted_By_Confirmed_cases_per_day['Confirmed'], textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 9 - Bar plot which shows increasing Number of Recovered cases in top 20 countries on a daily basis
In [27]:
#Bar plot for Increasing Number of Recovered cases in World in top 20 countries on a daily basis 
fig = px.bar(sorted_By_Confirmed_cases_per_day.head(20), 
             x="Date", y="Recovered", color='Recovered', 
             height=600, width=1000,
             title='Increasing Number of Recovered  cases in World on daily basis')
fig.update_traces(text=sorted_By_Confirmed_cases_per_day['Recovered'], textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 10 - Bar plot which shows increasing Number of Deaths in top 20 countries on a daily basis
In [28]:
#Bar plot for Increasing Number of Deaths cases in World in top 20 countries on a daily basis 
fig = px.bar(sorted_By_Confirmed_cases_per_day.head(20), 
             x="Date", y="Deaths", color='Deaths', 
             height=600, width=1000,
             title='Increasing Number of Deaths  cases in World on daily basis')
fig.update_traces(text=sorted_By_Confirmed_cases_per_day['Deaths'], textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 11 - Geographical map to track the global COVID-19 outbreak which produces number of confirmed cases in each country around the world
In [29]:
#geographical map based on Coronavirus spread 
fig = px.choropleth(sort_By_Confirmed_cases, locations="Country", locationmode='country names', 
                     color="Confirmed", hover_name="Country", range_color= [0, 40], projection="natural earth",
                    title='Coronavirus Spread across the world')
fig.update(layout_coloraxis_showscale=False)
fig.show()

Covid Outbreak - Data Visualization

  • Visualization 12 - Geographical map to track the global COVID-19 outbreak which produces number of confirmed cases in each country around the world
In [30]:
#geographical map based on Coronavirus confirmed cases spread 
fig = px.choropleth(sort_By_Confirmed_cases, locations="Country", 
                    locationmode='country names', color=np.log(sort_By_Confirmed_cases["Confirmed"]), 
                    hover_name="Country", hover_data=['Confirmed'],
                    color_continuous_scale="Sunsetdark", 
                    title='Countries with having Confirmed Cases')
fig.update(layout_coloraxis_showscale=False)
fig.show()

COVID19 Outbreak - Prediction using Machine Learning

  • We are going to use Prophet for prediction which is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. Benefits: Accurate and fast / Fully automatic / robust to missing data and shifts in the trend / easy to use Limitations: It works best with time series that have strong seasonal effects and several seasons of historical data. Usage: It only takes 2 inputs "ds" = timeseries and "y" = object of analysis value.

  • Using m.predict function for Prophet package forecast future 5 days confirmed cases development

  • Measure Prediction accuracy using the mean absolute percentage error, a.k.a MAPE. For example trend estimation, it fits with our needs the following Python function was written to help us when calculating the MAPE value of the model

  • Using Prophet Cross validation to predict and again measure using MAPE

  • Polynomial Regression to Predict future cases

  • Narrowing down our analysis to Ireland COVID19 growth rate

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 1 - Extract Coronavirus worlwide development data (Time series data) from Githhub repository
In [31]:
#define https paths for datasources "new daily data is updated automatically", hence easy for us to query results in future.
confirmed_global_path = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv'
deaths_global_path = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
recovered_global_path = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv' 

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 2: Transform the extracted data and organize data into new Dataframes
In [32]:
#create python functions to be used in melting, renaming and tranforming dates on the 3 available files.
#functions created to avoid work repetition while working thru the 3 different files.
def organize_data(path_url,case_type):
    df = pd.read_csv(path_url)
    melted_df = df.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'])
    melted_df.rename(columns={"Province/State":"Province","Country/Region":"Country"},inplace=True)
    melted_df.rename(columns={"variable":"Date","value":case_type},inplace=True)
    melted_df["Date"] = pd.to_datetime(melted_df["Date"])
    return melted_df

def combine_data(confirm_df,recovered_df,deaths_df):
    combined_df = confirm_df.join(recovered_df['Recovered']).join(deaths_df['Deaths']) #join since countries index are the same
    return combined_df
In [33]:
#call function to get and organize data into new dfs
confirmed_df = organize_data(confirmed_global_path,"Confirmed")
recovered_df = organize_data(recovered_global_path,"Recovered")
deaths_df = organize_data(deaths_global_path,"Deaths")

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 3: Load the data
In [34]:
#calling function to combine data from the previous created dfs
covid_df = combine_data(confirmed_df,recovered_df,deaths_df)
covid_df.head(2)
Out[34]:
Province Country Lat Long Date Confirmed Recovered Deaths
0 NaN Afghanistan 33.0000 65.0000 2020-01-22 0 0.0 0
1 NaN Albania 41.1533 20.1683 2020-01-22 0 0.0 0
In [35]:
#create daily wise Data frame required for timeseries predictions
df_daily_wise = covid_df.groupby("Date")[['Confirmed','Recovered', 'Deaths']].sum()
df_daily_wise.head(2)
Out[35]:
Confirmed Recovered Deaths
Date
2020-01-22 555 28.0 17
2020-01-23 654 30.0 18
In [36]:
#after grouping function is important to reset index, required for timeseries predictions
worldwide_cases = df_daily_wise.reset_index()
worldwide_cases.head(2)
Out[36]:
Date Confirmed Recovered Deaths
0 2020-01-22 555 28.0 17
1 2020-01-23 654 30.0 18

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 4: Extracting only Confirmed cases around the world and modifying dataframe to fit Prophet model
In [37]:
#since Prophet only takes 2 inputs, we will focus prediction analysis on confirmed cases, and rename accordingly to req.
confirmed_cases = worldwide_cases[["Date","Confirmed"]]
confirmed_cases.columns = ['ds','y']   # renaning required for inputs into the Prophet model
confirmed_cases.tail(2)
Out[37]:
ds y
73 2020-04-04 1197405
74 2020-04-05 1272115

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 5: Training the data using Prophet model
In [38]:
#load model from Facebook Prophet package
#The easiest way to install Prophet is through conda-forge: conda install -c conda-forge fbprophet.
from fbprophet import Prophet
In [39]:
#separate training and testing datasets. After some tests and due to exp growth rate, we decided to keep the predictin range of 5 days
train_df = confirmed_cases.loc[confirmed_cases.ds <= '2020-03-29']
test_df = confirmed_cases.loc[confirmed_cases.ds > '2020-03-29']



# initialize model
m = Prophet()

# train and fit model with training dataset using m.fit() funciont from prophet
m.fit(train_df)
INFO:fbprophet:Disabling yearly seasonality. Run prophet with yearly_seasonality=True to override this.
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[39]:
<fbprophet.forecaster.Prophet at 0x20716559f88>

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 6: Future prediction using Prophet function
In [40]:
#make future predictions. after several testing with different periods, we've noted that 5 days window works better
#prophet function m.make_future_dataframe
future = m.make_future_dataframe(periods=5)
future.tail(5)
Out[40]:
ds
68 2020-03-30
69 2020-03-31
70 2020-04-01
71 2020-04-02
72 2020-04-03

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 7: Forecasting the number of confirmed cases for the next 5 future dates
In [41]:
# utilize m.predict function for Prophet package to forecast future 5 days confirmed cases development
forecast = m.predict(future)
m.plot(forecast)
Out[41]:

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 8: Predict method assigned each row in future a predicted value which it names yhat
In [42]:
#check table with predicted values, in Prophet predict is named as yhat with lower and upper values
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(5)
Out[42]:
ds yhat yhat_lower yhat_upper
68 2020-03-30 662129.394297 637132.652917 687265.264613
69 2020-03-31 695974.909653 670065.171873 719695.782263
70 2020-04-01 732176.105231 706128.808766 759222.071642
71 2020-04-02 770218.803156 744489.740810 798548.963612
72 2020-04-03 809345.093373 783399.098712 838651.964396

COVID19 Outbreak - Prophet Modelling Prediction

  • Step 9: To compare predicted values (yhat) against testing data frame we are going to merge them as a single dataframe
In [43]:
#in order to compare predict values (yhat) against testing data frame, lets merge then
comparison_df = forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(5)
comparison_df = pd.merge(comparison_df, test_df,on="ds")
comparison_df.tail(5)
Out[43]:
ds yhat yhat_lower yhat_upper y
0 2020-03-30 662129.394297 637132.652917 687265.264613 782365
1 2020-03-31 695974.909653 670065.171873 719695.782263 857487
2 2020-04-01 732176.105231 706128.808766 759222.071642 932605
3 2020-04-02 770218.803156 744489.740810 798548.963612 1013157
4 2020-04-03 809345.093373 783399.098712 838651.964396 1095917

COVID19 Outbreak - Measure of Model accuracy (MAPE)

  • Step 10: we will be using the mean absolute percentage error, a.k.a MAPE since MAPE, is measure of prediction accuracy of a forecasting method, for example trend estimation, it fits with our needs the following Python function was written to help us when calculating the MAPE value of the model

  • this formula was built based on the article written by Ruan van der Merwe (2018) available at: https://towardsdatascience.com/implementing-facebook-prophet-efficiently-c241305405a3

In [44]:
def mean_absolute_percentage_error(y_true, y_pred):
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100
In [45]:
#call function to calculate MAPE
mape = mean_absolute_percentage_error(comparison_df.y, comparison_df.yhat)
print('MAPE: \n', mape.round(2),"%")
MAPE: 
 21.16 %

COVID19 Outbreak - Measure of Model accuracy (MAPE)

  • Step 10: Interpretating results of MAPE

  • MAPE of 21.16%, which in other words indicates that over all the points predicted, we are out with an average of 21.16% from the actual value.

  • Prophet allows the user to work in tunning the model with. For example the growth parameter, used by defaults was linear, for testing, we did change it to "logistic", but it then required a maxium value "CAP" for "y", which for our study, means establishing a maxium espected value of confirmed cases by an expert in the domain, which with the current world scenario, we felt that it din't make sense.

COVID19 Outbreak - Prophet Cross-Validation function

  • Step 11: With the goal of tunning our model, we will use the Prophet built-in cross validation function. The function takes the data and train the model on a specific period, it then predict an specified period. Prophet will then train on a bigger period, and predict again, this will repeat until the end point is reached.
In [46]:
#import function
from fbprophet.diagnostics import cross_validation
In [47]:
# new dt with cross validation function using previous model "m", 
#We specify the forecast horizon "horizon", initial and period are optionals 
df_cv = cross_validation(m, horizon = '5 days')
df_cv.tail()
INFO:fbprophet:Making 19 forecasts with cutoffs between 2020-02-08 00:00:00 and 2020-03-24 00:00:00
INFO:fbprophet:n_changepoints greater than number of observations. Using 13.
INFO:fbprophet:n_changepoints greater than number of observations. Using 15.
INFO:fbprophet:n_changepoints greater than number of observations. Using 17.
INFO:fbprophet:n_changepoints greater than number of observations. Using 19.
INFO:fbprophet:n_changepoints greater than number of observations. Using 21.
INFO:fbprophet:n_changepoints greater than number of observations. Using 23.
Out[47]:
ds yhat yhat_lower yhat_upper y cutoff
90 2020-03-25 371684.686413 354093.098678 388740.214889 467653 2020-03-24
91 2020-03-26 390200.150625 372312.564405 407549.476102 529591 2020-03-24
92 2020-03-27 409724.353060 392136.833175 426868.473866 593291 2020-03-24
93 2020-03-28 428587.223519 411842.260374 445216.256484 660706 2020-03-24
94 2020-03-29 447771.992493 430071.506321 465034.634012 720117 2020-03-24

COVID19 Outbreak - Prophet Cross-Validation function

  • Step 12: measure the MAPE after runnin the cross validation
In [48]:
#lets measure the MAPE after runnin the cross validation
mape = mean_absolute_percentage_error(df_cv.y, df_cv.yhat)
print('MAPE: \n', mape.round(2),"%")
MAPE: 
 18.41 %

COVID19 Outbreak - Changepoint funtion to identify changes in trends

  • Step 13: Prophet can also add changepoints to the plot function to check the points in which there were changes in the trend, as seen below, constatly changing within this period.
In [49]:
from fbprophet.plot import add_changepoints_to_plot
In [50]:
# using add_changepoints_to_plot function
fig = m.plot(forecast)
c = add_changepoints_to_plot(fig.gca(),m,forecast)

COVID19 Outbreak - Polynomial Regresssion Modelling

  • Step 14: Polynomial Regression is a regression algorithm that models the relationship between a dependent(y) and independent variable(x) as nth degree polynomial. It makes use of a linear regression model to fit the complicated and non-linear functions and datasets.For our case, where data points are arranged in a non-linear fashion, we need the Polynomial Regression model.

Sklearn

Scikit-learn a.ka. sklearn, is a free software machine learning library for the Python programming language. It features diffrent ML algorithms, which we will use in the following analysis. (Linear regression)

In [51]:
#import sklearn
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
In [52]:
#create daily wise Data frame required for timeseries predictions with index reseted
daily_cases = covid_df.groupby(["Date"])['Confirmed','Deaths', 'Recovered'].sum().reset_index()
daily_cases.head(2)
Out[52]:
Date Confirmed Deaths Recovered
0 2020-01-22 555 17 28.0
1 2020-01-23 654 18 30.0
In [53]:
#separate Train data "x confirmed in function of y daylywise developments - variables for regression model"
x_df=pd.DataFrame(daily_cases.index)
y_df=pd.DataFrame(daily_cases.Confirmed)
In [54]:
#training and test split from sklearn functions
x_train,x_test,y_train,y_test=train_test_split(x_df,y_df,test_size=0.1,random_state=0)
In [55]:
#create linear regression object and polynomial features 5 degrees 
#The following code, was inspired on the work published by Abdulrhman Alothman (2020) available at Kaggle.
poly_reg=PolynomialFeatures(degree=5)
x_poly=poly_reg.fit_transform(x_train)
lin_reg2=LinearRegression()
lin_reg2.fit(x_poly,y_train)
Out[55]:
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)
In [56]:
#model graphic for polynomial regression
cases_per_Day = covid_df.groupby(["Date"])['Confirmed','Deaths', 'Recovered'].sum().reset_index()
sorted_By_Confirmed1=cases_per_Day.sort_values('Date',ascending=False)

x=cases_per_Day.index

y=cases_per_Day.Confirmed

plt.scatter(x,y,color='red')
plt.plot(x_test,lin_reg2.predict(poly_reg.fit_transform(x_test)),color='blue')
plt.title("Polynomial Regression Model ")
plt.show()
In [57]:
#now, let's test the algorithm
y_pred=lin_reg2.predict(poly_reg.fit_transform(x_test))

result=pd.DataFrame(y_pred)
result['Real Value']=y_test.iloc[:,:].values
result['Predicted Value']=pd.DataFrame(y_pred)
result=result[['Real Value','Predicted Value']]
result
Out[57]:
Real Value Predicted Value
0 156101 1.579513e+05
1 242500 2.553002e+05
2 60368 6.003677e+04
3 1272115 1.321485e+06
4 75639 7.558483e+04
5 73258 7.154052e+04
6 660706 6.346463e+05
7 181574 1.897858e+05
In [58]:
print('Polynomial Regession  R2 Score   : ',r2_score(y_test, y_pred))
Polynomial Regession  R2 Score   :  0.9973364643211764
In [59]:
#lets measure the MAPE after runnin the cross validation
mape = mean_absolute_percentage_error(result['Real Value'], result['Predicted Value'])
print('MAPE: \n', mape.round(2),"%")
MAPE: 
 2.72 %
In [60]:
#today is 04/04/2020
print("After {0} day will be {1} case in the world".format((75-len(cases_per_Day)),lin_reg2.predict(poly_reg.fit_transform([[75]]))))
print("After {0} day will be {1} case in the world".format((77-len(cases_per_Day)),lin_reg2.predict(poly_reg.fit_transform([[77]]))))
print("After {0} day will be {1} case in the world".format((87-len(cases_per_Day)),lin_reg2.predict(poly_reg.fit_transform([[87]]))))
After 0 day will be [[1438815.35951321]] case in the world
After 2 day will be [[1698348.25250211]] case in the world
After 12 day will be [[3598257.36621131]] case in the world

COVID19 Outbreak - Growth Rate Analysis for a country

  • Step 15: Ireland COVID-19 development growth rate
In [61]:
#Since we want to narrow down our analysis and explore some countries similatiries, import relavant data 
#Countries indicators "Population, GDP" available at: World Bank Data - https://data.worldbank.org/
global_population_path = 'C:/Users/alexz/Desktop/Business Analytics/2nd Semester/Advanced Programming for Business Analytics/Assignments/Group Assignment/WorldDevelopmentIndicator_Population.csv'
global_gdp_path = 'C:/Users/alexz/Desktop/Business Analytics/2nd Semester/Advanced Programming for Business Analytics/Assignments/Group Assignment/WorldDevelopmentIndicator_GDP.csv'
#countries Human Development Index "HDI" available at: http://hdr.undp.org/en/indicators/137506#
global_hdi_path = 'C:/Users/alexz/Desktop/Business Analytics/2nd Semester/Advanced Programming for Business Analytics/Assignments/Group Assignment/UnitedNation_HumanDevelopmentIndex.csv'
In [62]:
#Since we have 3 different and big files to clean and merge "over 265 rowns and 133 columns in 3 files" 
#create python functions to automate reading, cleaning and join procedures "only 2017 data = most updated and completed"
def read_n_clean_data(path_url,indicator_type):
    df = pd.read_csv(path_url)
    df.rename(columns={df.columns[0]:"Country"},inplace=True)
    df = df[['Country', '2017']]
    df.rename(columns={df.columns[1]:indicator_type},inplace=True)
    return df

def combine_data(population_df,gdp_df,hdi_df):
    first_df = pd.merge(population_df, gdp_df, on="Country") #first merge all same 263 rowns, same web source: World Bank
    combined_df = pd.merge(first_df, hdi_df,on="Country", how="left") #second merge how=left, since diff source without the 256 rows
    return combined_df
In [63]:
#call functions to read and clean csv datasets "over 263 rowns and 158 columns in 3 files" 
population_df = read_n_clean_data(global_population_path,"population")
gdp_df = read_n_clean_data(global_gdp_path,"gdp")
hdi_df = read_n_clean_data(global_hdi_path,"hdi")
In [64]:
#call combine_data function to merge imported and cleaned data
indicator_df = combine_data(population_df,gdp_df,hdi_df)
indicator_df.tail()
Out[64]:
Country population gdp hdi
259 Kosovo 1830700.0 7.227700e+09 NaN
260 Yemen, Rep. 27834821.0 2.681870e+10 NaN
261 South Africa 57000451.0 3.495540e+11 0.704
262 Zambia 16853688.0 2.586814e+10 0.589
263 Zimbabwe 14236745.0 2.281301e+10 0.553
In [65]:
#since we want to filter by continent = Asia "COVID-19 initial hub", get country vs continen table from github
country_continent_path = 'https://raw.githubusercontent.com/dbouquin/IS_608/master/NanosatDB_munging/Countries-Continents.csv'
df_continents = pd.read_csv(country_continent_path)
df_continents = df_continents[["Country","Continent"]]
df_continents.tail(2)
Out[65]:
Country Continent
192 Uruguay South America
193 Venezuela South America
In [66]:
# final pandas merge, to join continent information. merge on left to keep complete daset, even with missing results.
countries_df = pd.merge(indicator_df, df_continents,on="Country", how="left")
countries_df.loc[countries_df['Country'] == "Ireland"]
Out[66]:
Country population gdp hdi Continent
109 Ireland 4807388.0 3.348340e+11 0.939 Europe
In [67]:
#Narrowing down analysis, we will identify country with similar indicators as Ireland and copare COVID-19 developmnet growth
countries_df.loc[(countries_df['population'] >= 4000000) & (countries_df['population'] <= 6000000) & 
                 (countries_df['Continent'].isin(['Asia', 'Oceania'])) &
                 (countries_df['hdi'] >= "0.9")]
Out[67]:
Country population gdp hdi Continent
178 New Zealand 4793900.0 2.025910e+11 0.92 Oceania
206 Singapore 5612253.0 3.384060e+11 0.934 Asia
In [68]:
#now that we identifie similar countries, we are instered in the Asia one, since it has been hitten earlier by the outbreak.
#for the next step of our analysis, we want to identifiew the weekly growth rate per cases and plot them for comparison
#the following function, was created to reduce code lenght and easyness of usage when queryin information from different countries
def calculate_growth_rate(df,country):
    df = df.loc[(df["Country"]) == country] #get covid global dataset and filter by country parameter input
    df = df.copy()                          #important to create a copy to avoid Warning message of pastin into sliced DF
    df["Weeknum"] = df["Date"].dt.week      #get week number based on reported date to group them and find growth rate
    df_weekly_growth = df.groupby(["Weeknum"])['Confirmed'].max().reset_index() #since values are reported accumulated. group and get max value for each week
    df_weekly_growth['GrowthRate'] = df_weekly_growth.Confirmed.pct_change().mul(100).round(2) #pct_change() function calculates the percentage change between the current and a prior element   
    df_weekly_growth["GrowthRate"] = df_weekly_growth["GrowthRate"].replace(np.inf, np.nan) # replace infitine division by NAN
    df_weekly_growth["GrowthRate"] = df_weekly_growth["GrowthRate"].replace(np.nan, 0) # replace NAN by 0 for plotting and column formatting requirements
    return df_weekly_growth
In [69]:
#call created function to get Ireland and Singapore weekly growth rates into a DT
weekly_growth_singapore = calculate_growth_rate(covid_df,"Singapore")
weekly_growth_ireland = calculate_growth_rate(covid_df,"Ireland")
weekly_growth_singapore # view weekly growth rate table for singapore
Out[69]:
Weeknum Confirmed GrowthRate
0 4 4 0.00
1 5 18 350.00
2 6 40 122.22
3 7 75 87.50
4 8 89 18.67
5 9 106 19.10
6 10 150 41.51
7 11 226 50.67
8 12 455 101.33
9 13 844 85.49
10 14 1309 55.09
In [70]:
# view weekly growth rate table for singapore
weekly_growth_ireland
Out[70]:
Weeknum Confirmed GrowthRate
0 4 0 0.00
1 5 0 0.00
2 6 0 0.00
3 7 0 0.00
4 8 0 0.00
5 9 1 0.00
6 10 19 1800.00
7 11 129 578.95
8 12 906 602.33
9 13 2615 188.63
10 14 4994 90.98
In [71]:
#plotting Singapore growth rate curve
plt.plot(weekly_growth_singapore["Weeknum"], weekly_growth_singapore["GrowthRate"])
plt.title("Singapore Weekly Growth Rate Curve")
plt.show()
In [72]:
#plotting Ireland growth rate curve. Since first cases were reported only in Week 9 "corresponding to week 4 from singapore"
#we can expect that growth rates per week will growth althought in a lower rate, due to lockdown measures placed in the country.
plt.plot(weekly_growth_ireland["Weeknum"], weekly_growth_ireland["GrowthRate"])
plt.title("Ireland Weekly Growth Rate Curve")
plt.show()

Final considerarions

Considering the current world scenario, where different governments took different public and health measures. Prediction of coronavirus worlwide spread development it's a very challenging task and thus raise future questions to be investigated, for example, what's the relevance of environmental factors (weather), population density or even nearby airport traffic.